home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Skunkware 98
/
Skunkware 98.iso
/
src
/
sgml
/
sgml2latex-format.1.3.tar.Z
/
sgml2latex-format.1.3.tar
/
doc
/
qwertz.sgml
< prev
Wrap
SGML Document
|
1993-11-24
|
77KB
<!doctype qwertz system [
<!entity LaTeX sdata "{\LaTeX}">
<!entity TeX sdata "{\TeX}" >
<!entity et "&etago;">
<!entity bigcup "<mc>\bigcup</>">
<!entity l "[">
<!entity r "]">
]>
<chapt>The <tt>qwertz</> Document Type Definition
All of the <tt>qwertz</> document "styles", except bibliographies,
are defined in a single SGML document type
definition (DTD), called
<tt>qwertz</>. It is essentially a SGML reconstruction of Lamport's &LaTeX
<cite id="Lamport86">. We have not attempted to include every feature
of &LaTeX in this DTD, but have included the features we use
regularly. Others may of course find that something they deem
important is missing. We welcome suggestions for improvements or
extensions.
We will be making use of several <em/parameter entities/ in this
DTD:
<p>
<code>
<!entity % emph
" em | it | bf | sf | sl | tt " >
<!entity % xref
" label | ref | pageref | cite | ncite " >
<!entity % inline
" (#pcdata | f | x | %emph; | sq | %xref)* " >
<!entity % list
" list | itemize | enum | descrip " >
<!entity % par
" %list; | comment | lq " >
<!entity % mathpar " dm | eq " >
<!entity % thrm
" def | prop | lemma | coroll | proof | theorem " >
<!entity % litprog " code | verb " >
<!entity % sectpar
" %par; | figure | tabular | table | %mathpar; |
%thrm; | %litprog; ">
</code>
These are just macros used in the definitions of various elements,
to avoid retyping and to ease maintenance. The <tt/emph/ parameter
lists the various kinds of emphasis. The <tt/inline/ parameter is for
the elements which may be used anywhere within the document. The
<tt/list/ parameter is for various kinds of lists. <tt/par/ lists
several basic kinds of elements at the level of paragraphs. The
<tt>mathpar</> parameter includes the elements for <em/displayed/
mathematical formulas. The <tt>thrm</> parameter is for the set of
elements used to represent such things as definitions, theorems and
proofs. The <tt>litprog</> parameter is for literate programming
elements. Finally, the <tt>sectpar</> parameter lists the elements
which may occur at the level of paragraphs within sections (or
chapters). Notice that this parameter uses other parameters.
Several kinds of documents may be written using &LaTeX: articles,
reports, books, letters and slide (or transparency) presentations. The <tt/qwertz/ DTD
includes two others as well: <tt/notes/, for documents such as notes to yourself which do not
require a title, sections, footnotes and the like; and <tt/manpage/, for Unix manual pages.
<code>
<!element qwertz o o
(sect | chapt | article | report |
book | letter | telefax | slides | notes | manpage ) >
</code>
Notice that sections (<tt>sect</>) and chapters (<tt>chapt</>) may
also be processed separately, before being put together into an
article, report or book.
&LaTeX also includes Bib&TeX, a program for creating
bibliographies whose entries can be easily cited in &LaTeX documents.
The <tt/qwertz/ document type for this purpose is described in Chapter 5.
<sect>General Purpose Entities and Elements</>
<p> This section describes the SGML entities and elements available in
all <tt/qwertz/ documents.
<code>
<!entity % general system -- general purpose characters -- >
%general;
</code>
<sect1>Characters Entities</>
<p>
Most characters are created just by typing the character wanted on
the keyboard. This simple method does not suffice when the character
wanted isn't in the character set available, or at least not
associated with a key on the keyboard, or when the character currently
has special meaning to SGML or, perhaps, &TeX;. In this section, a
fairly large number of general purpose character entities will be
presented. Symbols and characters which may be used only in
mathematical formulas will be discussed separately, in section <ref
id="math">.
When may it be necessary to use of an entity reference to produce
some character? There are three cases to watch out for:
<descrip>
<tag>SGML Concrete Syntax Delimiters.</>
Although the SGML standard allows alternative concrete syntaxes to
be defined, we use the so-called <em>reference concrete syntax</> in
the <tt>qwertz</> document types. In this reference syntax, < is
the <em>start tag open</> character, and <tt></</> is the <em>end
tag open</> delimiter. The other SGML delimiter authors should be
aware of is &, the <em>entity reference open</> delimiter of the
reference syntax.
The appropriate entity to use to generate these characters depends
on the context. Normally, use <tt>lt</> to represent < and
<tt>amp</> to get &, when these appear in strings which might
otherwise be interpreted as starting tags or entity references.
However, within the <tt>code</> or <tt>verb</> elements for literate
programming, described in section <ref id="litprog">, use the
<tt>ero</> entity to represent & and the <tt>etago</> entity for
the sequence <tt>&etago</>.
<verb>
<!entity lt sdata "<" >
<!entity amp sdata "&" >
<!entity ero sdata "&ero;" >
<!entity etago sdata "&etago;" >
</verb>
<tag>SGML Short Reference Delimiters.</>
In SGML document types <em>short reference maps</> may be defined
which allow single characters to be interpreted as arbitrarily complex
sequences of characters, including SGML tags and entity references.
Thus, to know precisely when a certain character will be interpreted
literally or as a short reference (i.e. macro) for something else, one
has to know which map is in effect in the context of the current
element. Just about all punctuation characters which are not used as
delimiters in the concrete syntax can be used as short reference
delimiters:
<verb>
" # % ' ( ) * + , - : ; = @ [ ] ^ _ { | } ~
</verb>
For each of these characters, there is an SGML entity which may be
used to generate the ASCII character in the printed document, listed
in table <ref id="GPC">. <em>Usually, it will not be necessary to use these
entities; the character can simply be typed and will be interpreted
literally.</> However, if the results are not as expected, check to
see if there is a map in effect at that point in the document in which
the character has been redefined. As maps are associated with
elements, the section in this manual describing an element will also
direct you to a description of the applicable map, if there is one.
As it turns out, one important use of character maps is to generate
exactly the character typed in the printed document. That is, the map
is used to hide the special meaning of the character to the underlying
formatter (e.g. &TeX;), replacing the character with the formatting
instructions for generating the character. This has been the main use
of maps in our <tt>qwertz</> document type definitions.
<verb>
<!entity dquot sdata "&dquot;" >
<!entity num sdata "#" >
<!entity percnt sdata "%" >
<!entity quot sdata """ >
<!entity lpar sdata "(" >
<!entity rpar sdata ")" >
<!entity ast sdata "*" >
<!entity plus sdata "+" >
<!entity comma sdata "," >
<!entity hyphen sdata "‐" >
<!entity colon sdata ":" >
<!entity semi sdata ";" >
<!entity equals sdata "=" >
<!entity commat sdata "@" >
<!entity lsqb sdata "[" >
<!entity rsqb sdata "]" >
<!entity circ sdata "ˆ" >
<!entity lowbar sdata "_" >
<!entity lcub sdata "{" >
<!entity verbar sdata "|" >
<!entity rcub sdata "}" >
<!entity tilde sdata "˜" >
</verb>
<tag>&TeX Special Characters.</>
Ideally, it should be possible to hide the conventions of the
underlying formatting system completely. In fact, SGML parsers which
implement the full ISO standard have a feature which makes this
possible. However, the SGML parser we are using does not include this
feature: the only characters which can serve as short references are
the characters allowed for this purpose by the reference concrete
syntax. Unfortunately, this reference syntax does not allow &,
&dollar and &bsol to be used as short references, which are all
special &TeX characters. Thus, the entities for these three
characters (<tt>amp, dollar</> and <tt>bsol</>) must usually be used
to produce them. (The &dollar and &bsol characters may be used
directly within the <tt>verb</> and <tt>code</> elements, discussed
below in section <ref id="litprog">. Also, within these elements use
the <tt>ero</> entity to represent & in strings which might
otherwise be interpreted as entity references.)
<verb>
<!entity bsol sdata "\" >
<!entity dollar sdata "$" >
</verb>
</descrip>
<sect1>Spacing, Dashes and Ellipsis</>
<p>The meaning of the ordinary space character is context sensitive.
Sometimes there is a space <em>within</> a single word. Such spaces
can be typed using the <em>nonbreakable space</> (<tt>nbsp</>) entity
to avoid breaking the word at that point at the end of line. There
are also contexts where one wants a certain amount of space to appear,
without it being regarded by the formatter as being space which may be
shrunk in order to clean-up the arrangement of words or characters on
the line. There are three entities for this purpose: <tt>emsp</>
denotes the amount of horizontal space required for the character "M".
An <tt>ensp</> is just half as wide as an <tt>emsp</>, and a <em>thin
space</> (<tt>thinsp</>) is <f>1/6</> of an <tt>emsp</>. Notice that
these are relative amounts, depending on the font being used.
There are also three different kinds of dashes: <tt>hyphen</>, which
was already mentioned above, is to be used for intra-word dashes, as
in the word "intra-word".<footnote>However, the <tt>hyphen</> entity
was not actually necessary here, as the - character was not being used
in this context as a short reference.</footnote> <tt>ndash</> is to be
used for number ranges, such as "23–56", and <tt>mdash</> is an
alternative delimiter for parenthetical comments &mdash certainly
you've seen them used this way &mdash perhaps to avoid too frequent
use of commas or parentheses.
<verb>
<!entity nbsp sdata " " >
<!entity emsp sdata " " >
<!entity ensp sdata " " >
<!entity thinsp sdata " " >
<!entity mdash sdata "—" >
<!entity ndash sdata "–" >
<!entity hellip sdata "…" >
</verb>
<sect1>Foreign Languages</>
<p>
There are a large set of entities for other Western European
languages. Altogether, there are entities for almost all of the
foreign language characters in ISO 8859, the Latin 1 character set for
Western European languages.<footnote>Only the four Icelandic
characters are missing.</> Conveniently, these entities are all
available in the usual Adobe PostScript fonts, as well as in &TeX;.
Thus, all of the entities defined here can be printed in &TeX;, on
PostScript printers, or displayed on any Latin 1 device. Depending on
the computer and editor, it may also be possible to type these Latin 1
characters directly, instead of having to use these entities. A
simple filter could translate Latin 1 files into ASCII files,
replacing non-ASCII characters by entity references. The entity names
chosen here for these characters conform to the SGML standard.
<verb>
<!entity aacute sdata 'á' >
<!entity Aacute sdata 'Á' >
<!entity acirc sdata 'â' >
<!entity Acirc sdata 'Â' >
<!entity agrave sdata 'à' >
<!entity Agrave sdata 'À' >
<!entity aring sdata 'å' >
<!entity atilde sdata 'ã' >
<!entity Atilde sdata 'Ã' >
<!entity auml sdata 'ä' >
<!entity Auml sdata 'Ä' >
<!entity aelig sdata 'æ' >
<!entity AElig sdata 'Æ' >
<!entity ccedil sdata 'ç' >
<!entity Ccedil sdata 'Ç' >
<!entity eacute sdata 'é' >
<!entity Eacute sdata 'É' >
<!entity ecirc sdata 'ê' >
<!entity egrave sdata 'è' >
<!entity Egrave sdata 'È' >
<!entity euml sdata 'ë' >
<!entity Euml sdata 'Ë' >
<!entity iacute sdata 'í' >
<!entity Iacute sdata 'Í' >
<!entity icirc sdata 'î' >
<!entity Icirc sdata 'Î' >
<!entity igrave sdata 'ì' >
<!entity Igrave sdata 'Ì' >
<!entity iuml sdata 'ï' >
<!entity Iuml sdata 'Ï' >
<!entity ntilde sdata 'ñ' >
<!entity Ntilde sdata 'Ñ' >
<!entity oacute sdata 'ó' >
<!entity Oacute sdata 'Ó' >
<!entity ocirc sdata 'ô' >
<!entity Ocirc sdata 'Ô' >
<!entity ograve sdata 'ò' >
<!entity Ograve sdata 'Ò' >
<!entity oslash sdata 'ø' >
<!entity Oslash sdata 'Ø' >
<!entity otilde sdata 'õ' >
<!entity ouml sdata 'ö' >
<!entity Ouml sdata 'Ö' >
<!entity szlig sdata 'ß' >
<!entity uacute sdata 'ú' >
<!entity Uacute sdata 'Ú' >
<!entity ucirc sdata 'û' >
<!entity ugrave sdata 'ù' >
<!entity Ugrave sdata 'Ù' >
<!entity uuml sdata 'ü' >
<!entity Uuml sdata 'Ü' >
<!entity yacute sdata 'ý' >
<!entity Yacute sdata 'Ý' >
<!entity yuml sdata 'ÿ' >
</verb>
The <tt>qwertz</> document types were developed in a German research
center, so we have included entities for the German characters with
shorter names than the entity names used in the SGML standard. Notice
that these are just synonyms for the standard entities, which are also
included.
<code>
<!entity Ae '&ero;Auml;' >
<!entity ae '&ero;auml;' >
<!entity Oe '&ero;Ouml;' >
<!entity oe '&ero;ouml;' >
<!entity Ue '&ero;Uuml;' >
<!entity ue '&ero;uuml;' >
<!entity sz '&ero;szlig;' >
</code>
<sect1>Other Symbols</>
<p>
Finally, there are entities for a few miscellaneous symbols,
such as §, ¶, ©, ¬, ÷, ±, ×, and
μ. All of these entities name symbols in the Latin 1 character
set. They may be used anywhere within a document. (In particular, the
mathematical symbols shown here need not be within one of the formula
elements described below, in section <ref id="math">.) The entity
names for these, and all the other character entities discussed above,
are listed in table <ref id="GPC">. <em/A document which does not
include mathematical formulas or graphics and which uses only the character
entities defined in this chapter can be displayed or printed using a
single Latin 1 font/.
<verb>
<!entity gt sdata ">" >
<!entity sect sdata "§">
<!entity para sdata "¶">
<!entity copy sdata "©">
<!entity iexcl sdata "¡" >
<!entity iquest sdata "¿" >
<!entity cent sdata "¢" >
<!entity pound sdata "£" >
<!entity not sdata "¬" >
<!entity divide sdata "÷" >
<!entity plusmn sdata "±" >
<!entity times sdata "×" >
<!entity mu sdata "μ" >
</verb>
<table>
<tabular ca="ll|ll|ll|ll">
AElig | Æ |
Aacute | Á |
Acirc | Â |
Ae | &Ae @
Agrave | À |
Atilde | Ã |
Auml | Ä |
Ccedil | Ç @
Eacute | É |
Egrave | È |
Euml | Ë |
Iacute | Í @
Icirc | Î |
Igrave | Ì |
Iuml | Ï |
Ntilde | Ñ @
Oacute | Ó |
Ocirc | Ô |
Oe | &Oe |
Ograve | Ò @
Oslash | Ø |
Ouml | Ö |
Uacute | Ú |
Ue | &Ue @
Ugrave | Ù |
Uuml | Ü |
Yacute | Ý |
aacute | á @
acirc | â |
ae | &ae |
aelig | æ |
agrave | à @
amp | & |
aring | å |
ast | &ast |
atilde | ã @
auml | ä |
bsol | &bsol |
ccedil | ç |
cent | ¢ @
circ | &circ |
colon | &colon |
comma | &comma |
commat | &commat @
copy | © |
divide | ÷ |
dollar | &dollar |
dquot | &dquot @
eacute | é |
ecirc | ê |
egrave | è |
emsp | @
ensp | |
equals | &equals |
euml | ë |
gt | > @
hellip | &hellip |
hyphen | &hyphen |
iacute | í |
icirc | î @
iexcl | ¡ |
igrave | ì |
iquest | ¿ |
iuml | ï @
lcub | &lcub |
lowbar | &lowbar |
lpar | &lpar |
lsqb | &lsqb @
lt | < |
mdash | &mdash |
mu | &mu |
nbsp | @
ndash | &ndash |
not | ¬ |
ntilde | ñ |
num | &num @
oacute | ó |
ocirc | ô |
oe | &oe |
ograve | ò @
oslash | ø |
otilde | õ |
ouml | ö |
para | ¶ @
percnt | &percnt |
plus | &plus |
plusmn | ± |
pound | £ @
quot | " |
rcub | &rcub |
rpar | &rpar |
rsqb | &rsqb @
sect | § |
semi | &semi |
sz | &sz |
szlig | ß @
thinsp | |
tilde | &tilde |
times | × |
uacute | ú @
ucirc | û |
ue | &ue |
ugrave | ù |
uuml | ü @
verbar | &verbar |
yacute | ý |
yuml | ÿ |
</tabular>
<caption><label id="GPC">General Purpose Characters</caption>
</table>
<sect1>Sentences, Paragraphs, Emphasis and Quotations
<p>
Sentences need not be marked up with tags. There is no
<tt>sentence</> element as such. Rather, these are marked implicitly
using the usual conventions for beginning and ending sentences.
Paragraphs are delimited with the <tt/p/ tag. Both the starting tag
and ending tag are optional.
<code>
<!element p o o ( %inline | %sectpar )+ >
<!entity ptag '<p>' >
<!entity psplit '&etago;p><p>' >
<!shortref pmap
"&ero;#RS;B" null
"&ero;#RS;B&ero;#RE;" psplit
"&ero;#RS;&ero;#RE;" psplit
'"' qtag
"[" ftag
"~" nbsp
"_" lowbar
"#" num
"%" percnt
"^" circ
"{" lcub
"}" rcub
"|" verbar >
<!usemap pmap p>
</code>
Sentences or phrases within paragraphs can be emphasized in a number
of ways. The <tt>em</> tag is used to choose the default form of
emphasis, which is usually <em>italic</> type, but depends on the
style of the background text. If the background text is formatted in
italics type, as it usually is in definitions, for example, than
emphasized text will be formatted using a plain, roman typeface.
However, various forms of emphasis can be explicitly chosen. These
include: <bf>bold face</> (<tt>bf</>), <it>italics</> (<tt>it</>),
<sf>sans serif</> (<tt>sf</>), <sl>slanted</> (<tt>sl</>), and
<tt>typewriter</> (<tt>tt</>) styles.
<code>
<!element em - - (%inline)>
<!element bf - - (%inline)>
<!element it - - (%inline)>
<!element sf - - (%inline)>
<!element sl - - (%inline)>
<!element tt - - (%inline)>
</code>
The <tt>tt</> element simulates a "typewriter". That is, with a
couple of exceptions, characters are printed exactly as they appear on
the display. This is useful for including small segments of computer
code within paragraphs. See the section on literate programming for
more information, <ref id="litprog">.
Sentences within paragraphs can be quoted using the <em>short
quote</>, (<tt>sq</>) tag, as in <tt><sq>The rain in Spain falls
mainly on the plain.</></tt>, but this is usually not necessary. In
most contexts where one will want to use quotations, there is a map
allowing the &dquot symbol to be used as a short reference for both
the starting and ending <tt>sq</> tags. So one can just type:
<tt>"The rain in Spain falls mainly on the plain."</>
Quotations extending over a number of paragraphs are marked using the
<em>long quote</> (<tt>lq</>) element. Long quotes are formatted in
&LaTeX by indenting the left and right margins. For example, <ncite
id="Lamport86" note="pp. xiii">:
<lq>
<p>The &LaTeX document preparation system is a special version of
Donald Knuth's &TeX program. &TeX is a sophisticated program designed
to produce high-quality typesetting, especially for mathematical text.
&hellip
&LaTeX represents a balance between functionality and ease of use.
Since I implemented most of it myself, there was also a continual
compromise between what I wanted to do and what I could do in a
reasonable amount of time. &hellip
</lq>
<code>
<!element sq - - (%inline)>
<!entity ftag '<f>' -- formula begin -- >
<!entity qendtag '&et;sq>'>
<!shortref sqmap
"&ero;#RS;B" null
'"' qendtag
"[" ftag
"~" nbsp
"_" lowbar
"#" num
"%" percnt
"^" circ
"{" lcub
"}" rcub
"|" verbar >
<!usemap sqmap sq >
<!element lq - - (p*)>
</code>
<sect1>Lists
<p>
Four types of lists are supported, which differ according to the
type of label used to mark each item in the list. Use <tt>itemize</>
to create a list in which each item is marked with some symbol such as
a dash or bullet. The <tt>enum</> tag is used to create an
enumeration, i.e. a list in which each item is labelled with a number
(or letter) indicating its rank or position in the list. The
<tt/list/ type of list does not label the items at all. Finally, use
<tt>descrip</> to create a list in which each item is labelled by some
tag of your own choice. Lists of various types can nested. For
example:
<verb>
<itemize>
<item>
A level one item.
<item> Here's level two:
<enum>
<item> A level two item.
<item> Here's level three:
<enum>
<item> A level three item.
<item>Here's level four:
<descrip>
<tag/Red./ Is the color of my true love's hair.
<tag/Blue./ Is a property of some movies.
<tag/Yellow./ Characterizes some forms of journalism.
&et;descrip>
<item>A last level three item
&et;enum>
<item>A last level two item
&et;enum>
<item>A last level one item.
&et;itemize>
</verb>
This is formatted by &LaTeX; as:
<itemize>
<item>
A level one item.
<item> Here's level two:
<enum>
<item> A level two item.
<item> Here's level three:
<enum>
<item> A level three item.
<item>Here's level four:
<descrip>
<tag/Red./ Is the color of my true love's hair.
<tag/Blue./ Is a property of some movies.
<tag/Yellow./ Characterizes some forms of journalism.
</descrip>
<item>A last level three item
</enum>
<item>A last level two item
</enum>
<item>A last level one item.
</itemize>
<code>
<!element itemize - - (item+)>
<!element list - - (item+)>
<!element enum - - (item+)>
<!element descrip - - ((tag?, (%inline; | %sectpar;)*, p*)+) >
<!element item o o ((%inline; | %sectpar;)*, p*) >
<!element tag - o (%inline)>
<!usemap global (list,itemize,enum,descrip)>
</code>
For reasons having to do with our translation into &LaTeX, line feeds
within <tt>tag</> elements are translated into spaces, using the
<tt>oneline</> short reference map:
<!--
<!shortref bodymap
"&ero;#RS;B&ero;#RE;" ptag
"&ero;#RS;&ero;#RE;" ptag
'"' qtag
"[" ftag
"~" nbsp
"_" lowbar
"#" num
"%" percnt
"^" circ
"{" lcub
"}" rcub
"|" verbar>
-->
<code>
<!entity space " ">
<!entity null "">
<!shortref oneline
"&ero;#RS;&ero;#RE;" null
"&ero;#RS;B&ero;#RE;" null
'"' qtag
"[" ftag
"~" nbsp
"_" lowbar
"#" num
"%" percnt
"^" circ
"{" lcub
"}" rcub
"|" verbar>
<!usemap oneline tag>
</code>
<sect1>Figures and Tables</>
<p>Figures and tables are floating elements; they may appear at a
different location in the printed version of the document than in the
input file. There is a location (<tt/loc/) attribute, which can be
used to influence the location chosen by the formatter. The value of
the <tt/loc/ attribute is a string of up to four letters, where each
letter declares a location at which the figure or table may appear, as
follows:
<descrip>
<tag/<tt/h/./ At the same relative location as it appears in the SGML
input file (i.e. <em/here/).
<tag/<tt/t/./ At the <em/top/ of a page.
<tag/<tt/b/./ At the <em/bottom/ of a page.
<tag/<tt/p/./ On a separate <em/page/ containing only figures and tables.
</descrip>
The default value of the <tt/loc/ attribute is <tt/tbp/.
A <tt>figure</> is a graphic combined with an optional caption. Two
types of figures are currently supported. The first, and easiest, is
to use the <tt>eps</> tag to include an Encapsulated PostScript file
in the document. Encapsulated PostScript files are centered
horizontally on the page. The size of the graphic is its "natural"
size; i.e. the size it would have if printed directly on a PostScript
printer. You need only know the name of the file containing the
graphic.
Encapsulated PostScript graphics can be created using a variety of
different editors. If you are using Unix with an X11-based graphical
user-interface, you may want to try <tt>idraw</>, which stores its
documents directly as Encapsulated PostScript files. Other interesting
X11-based drawing program are <tt/xfig/ and <tt/tgif/.
For example, to include the graphic contained in an Encapsulated
PostScript file named <tt>issues.ps</>, you would type:
<verb>
<figure>
<eps file="issues">
<caption>An <tt>idraw&et;> Drawing &et;>
&et;figure>
</verb>
Which would then appear as in figure <ref id="issues">.
<figure>
<eps file="issues">
<caption><label id="issues">An <tt>idraw</> Drawing </>
</figure>
Notice that the ".ps" extension is <em>not</> to be included in the
file attribute of the <tt>eps</> element, but that the actual file
must include the ".ps" extension.
The second possibility is to use the <em/placeholder/ (<tt>ph</>)
tag to leave space in which to later paste the graphic, in the old,
reliable manner. For example, to leave 10 cm space for
some graphic, type:
<verb>
<figure>
<ph vspace="10cm">
&et;figure>
</verb>
Be sure not to leave a space between the number and the unit of
measurement used, which may be <tt>cm</>, <tt>mm</> or <tt>in</>.
<code>
<!element figure - - ((eps | ph ), caption?)>
<!attlist figure
loc cdata "tbp">
<!element eps - o empty >
<!attlist eps
file cdata #required>
<!element ph - o empty >
<!attlist ph
vspace cdata #required>
<!element caption - o (%inline)>
<!usemap oneline caption>
</code>
Next, there is a <tt>tabular</> element. Using &LaTeX;, tabulars
must be small enough to fit on a single page. The current
<tt>tabular</> element has been kept quite simple. It certainly does
not (yet) offer all the flexibility of &LaTeX;. However, it may well
be that it is sufficient for most users. More complex tables can,
depending on your choice of formatters, be created using &LaTeX or
Unix's <tt/tbl/ program, with the <tt>x</> element, or with any
program capable of generating Encapsulated PostScript, which can then
be included using an <tt>eps</> element.
A <tt>tabular</> consists of a number of rows, separated by the
<tt>rowsep</> element, each of which consists of a number of columns
separated by the <tt>colsep</> element.
The format of the tabular is controlled by the <em>column
alignment</> (<tt>ca</>) attribute. For each column in the tabular
there is a letter in the <tt>ca</> attribute: 1) <tt>c</> for
centered; 2) <tt>l</> for flush left; or 3) <tt>r</> for flush right.
In addition, &verbar can be used to insert vertical lines running the
complete height of the table. This will be made clear in the example
which is coming shortly.
First, however, let me describe the short reference map defined for
tabulars. Rather than typing <tt><colsep></> and
<tt><rowsep></> explicitly, one can just type &verbar to separate
columns, and &commat to separate rows. Also, within tabulars, &lsqb
can be used to start a mathematical formula, and &dquot starts short
quotes as usual. (The other short references just hide any special
meaning the character may have to &TeX;.)
<code>
<!entity % tabrow "(%inline, (colsep, %inline)*)" >
<!element tabular - -
(%tabrow, (rowsep, hline?, %tabrow)*, caption?) >
<!attlist tabular
ca cdata #required>
<!element rowsep - o empty>
<!element colsep - o empty>
<!element hline - o empty>
<!entity rowsep "<rowsep>">
<!entity colsep "<colsep>">
<!shortref tabmap
"&ero;#RE;" null
"&ero;#RS;&ero;#RE;" null
"&ero;#RS;B&ero;#RE;" null
"&ero;#RS;B" null
"B&ero;#RE;" null
"BB" null
"&ero;#SPACE;" null
"&ero;#TAB;" null
"@" rowsep
"|" colsep
"[" ftag
'"' qtag
"_" thinsp
"~" nbsp
"#" num
"%" percnt
"^" circ
"{" lcub
"}" rcub >
<!usemap tabmap tabular>
</code>
The <tt>hline</> element can be use to draw a horizontal line along
the length of the table, to separate rows.
A <tt/table/ element consists of a <tt>tabular</> followed by an
optional <tt>caption</>. Unlikes tabulars, A <tt/table/ is a floating
"body", like a figure. It may be moved to another (near) location
within the formatted document. A <tt/tabular/, however, appears at
the same place in the formatted document as in the SGML source file.
<code>
<!element table - - (tabular, caption?) >
<!attlist table
loc cdata "tbp">
</code>
Here is how table <ref id="GPC"> was typed:
<verb>
<table>
<tabular ca="ll|ll">
ae | &ero;ae | Ae | &ero;Ae @
oe | &ero;oe | Oe | &ero;Oe @
ue | &ero;ue | Ue | &ero;Ue @
sz | &ero;sz | amp | &ero;amp @
bsol | &ero;bsol | circ | &ero;circ @
.
.
.
Dagger | &ero;Dagger | sect | &ero;sect @
para | &ero;para | copy | &ero;copy @
mdash | &ero;mdash | tilde | &ero;tilde
&et;tabular>
<caption><label id="GPC">
General Purpose Characters
&et;caption>
&et;table>
</verb>
<sect1><heading><label id="litprog">Literate Programming</>
<p>
The original motivation behind the development of these document
types was to create an environment for literate programming in an
arbitrary programming language similar to Donald Knuth's WEB system
for literate programming in Pascal <cite id="Knuth84">. The basic
idea is to include the source code of a program inside of its
documentation, instead of the other way around: including comments
within the source code.
The features offered here to support literate programming, or merely
the documentation of existing programs, have been kept to a minimum.
Snippets of code can be mentioned within sentences using the <tt>tt</>
tag. These are formatted using a <tt>typewriter</> font suitable for
program code, but the spacing and indentation of the code is not
retained. Within <tt/tt/ elements, the only characters which may not
be literally interpreted are &dollar, &bsol, &, and <tt></</>.
For the &dollar and &bsol symbols, always use the <tt>dollar</> and
<tt>bsol</> entities. For the & and < symbols, use the
<tt>amp</> and <tt>lt</> entities if the string in which they occur
could be mistaken for an entity reference, an element start tag or an
element end tag.
To include larger segments of code, retaining its line breaks,
tabulation and spacing, use the <tt>code</> tag or the <tt>verb</>
tag. Within these tags just about all characters are interpreted
literally. The exceptions are:
<enum>
<item>As SGML entities may be used within <tt>verb</> and
<tt>code</> elements, use the <tt>ero</> entity to represent the &
symbol in strings which might otherwise be mistaken for entity
references. (Notice that the <tt>amp</> entity is not used to represent &
in this context.)
<item> As there must be some way of ending such elements,
use the <tt>etago</> entity to represent <tt>&et</> in strings
which might otherwise be interpreted as end tags. (Do not use the
<tt>lt</> entity for this purpose here.) Start tags can be typed
literally in this context, without using entities.
<item>Unfortunately &TeX peeks through a bit here
as well; The string <tt>\end{verbatim}</> may not occur within
<tt>code</> or <tt>verb</> elements. Presumably this will not often
be a problem.
</enum>
For example, to include the "hello world" C program in a document,
just type:
<verb>
<code>
main ()
{
/* This is the famous hello world program */
printf("hello world\n");
}
&et;code>
</verb>
When formatted, spaces and line breaks are preserved:
<verb>
main ()
{
/* This is the famous hello world program */
printf("hello world\n");
}
</verb>
Notice that no entities where required in this code.
With few exceptions, it should be possible to just wrap <tt>verb</> or
<tt>code</> tags around existing pieces of code without change.
The idea of literate programming is that the documentation
<em>is</> the program, so there must be some way of extracting the
source code from the SGML document. Just how to do this is described
in chapter <ref id="UC">, below.
The user must have a means of indicating which pieces of code are
to be included in the source code, and in which order. Our solution
to this problem is very simple: <em>Only <tt>code</> elements are to
be extracted, and they are extracted in the same order as they appear
in the document.</> That is, <tt>verb</> elements are <em>not</>
extracted, and may be used, e.g., for examples or draft versions of
the code included for explanatory or tutorial purposes.
<tt>code</> and <tt>verb</> elements may be formatted differently.
Using our translation into &LaTeX, for example, <tt>code</> elements
are distinguished by being bracketed by lines the width of the page.
<code>
<!element code - - rcdata>
<!element verb - - rcdata>
<!shortref ttmap
"&ero;#RS;B" null
'#' num
'%' percnt
'~' tilde
'_' lowbar
'^' circ
'{' lcub
'}' rcub
'|' verbar >
<!usemap ttmap tt>
</code>
<sect1><heading><label id="math">Mathematical Formulas</>
<p>
The <tt>qwertz</> document types include elements for describing
mathematical formulas completely within SGML, similar to the system
described in <cite id="daphne89">. To start, there are a fairly large
number of entities for mathematical symbols. (The set of entities
chosen are for the symbols available in both &TeX and in the
PostScript Symbol font.) Although this may be a minor irritation for
seasoned &TeX users, we have decided to follow the naming conventions
for mathematical symbols adopted in the SGML Standard <cite
id="Smith88">. The complete set of mathematical symbols currently
defined, including the Greek alphabet are listed in
tables <ref id="mathsym"> and <ref id="greek">, in alphabetical
order.
<code>
<!entity % math system -- math symbols -- >
%math;
</code>
<table>
<tabular ca="ll|ll|ll|ll">
Prime | [&Prime] |
aleph | [&aleph] |
and | [&and] |
ang | [&ang] @
ap | [&ap] |
arr | [&darr] |
bottom | [&bottom] |
bull | [&bull] @
cap | [&cap] |
cir | [&cir] |
clubs | [&clubs] |
congr | [&congr] @
cup | [&cup] |
diams | [&diams] |
divide | [÷] |
dot | [&dot] @
empty | [&empty] |
equiv | [&equiv] |
exist | [&exist] |
forall | [&forall] @
ge | [&ge] |
hArr | [&hArr] |
harr | [&harr] |
hearts | [&hearts] @
image | [&image] |
infin | [&infin] |
isin | [&isin] |
lArr | [&lArr] @
lang | [&lang] |
larr | [&larr] |
le | [&le] |
mid | [&mid] @
minus | [&minus] |
nabla | [&nabla] |
ne | [&ne] |
nequiv | [&nequiv] @
not | [¬] |
notin | [¬in] |
nsub | [&nsub] |
nsube | [&nsube] @
nsup | [&nsup] |
nsupe | [&nsupe] |
nvDash | [&nvDash] |
nvdash | [&nvdash] @
oplus | [&oplus] |
or | [&or] |
otimes | [&otimes] |
part | [&part] @
plusmn | [±] |
prime | [&prime] |
prop | [&prop] |
rArr | [&rArr] @
rang | [&rang] |
rarr | [&rarr] |
real | [&real] |
setmn | [&setmn] @
spades | [&spades] |
square | [&square] |
sub | [&sub] |
sube | [&sube] @
sup | [&sup] |
supe | [&supe] |
times | [×] |
uArr | [&uArr] @
uarr | [&uarr] |
vDash | [&vDash] |
vdash | [&vdash] @
</tabular>
<caption><label id="mathsym">Math Symbols</>
</table>
<table>
<tabular ca="ll|ll|ll">
alpha | [&alpha] |
beta | [&beta] |
gamma | [&gamma] @
Gamma | [&Gamma] |
delta | [&delta] |
Delta | [&Delta] @
epsi | [&epsi] |
zeta | [&zeta] |
eta | [&eta] @
thetas | [&thetas] |
Theta | [&Theta] |
iota | [&iota] @
kappa | [&kappa] |
lambda | [&lambda] |
mu | [&mu] @
nu | [&nu] |
xi | [&xi] |
Xi | [&Xi] @
pi | [&pi] |
Pi | [&Pi] |
rho | [&rho] @
sigma | [&sigma] |
sigmav | [&sigmav] |
Sigma | [&Sigma] @
tau | [&tau] |
upsi | [&upsi] |
Upsi | [&Upsi] @
phis | [&phis] |
Phi | [&Phi] |
chi | [&chi] @
psi | [&psi] |
Psi | [&Psi] |
omega | [&omega] @
Omega | [&Omega]
</tabular>
<caption><label id="greek">Greek Letters</>
</table>
&TeX symbols not in table 2 may nonetheless be generated, by
defining an entity using the <tt>mc</> element. For example, to print
the <x>$\leadsto$</x> symbol, you could first define an entity,
perhaps using the name adopted for this symbol in the SGML standard:
<verb>
<!entity rarrw "<mc/<x/\leadsto//">
</verb>
Of course, this approach is &TeX dependent. But this dependency is
clearly noted at the beginning of the document, and it would be an
easy matter to replace the &TeX command for such entities with the
appropriate commands for some other formatter.
The <tt>mc</> tag used in this entity definition is for <em>math
characters</>. The entity could have been defined using only the <tt>x</>
tag described in section <ref id="misc">, but it is "safer" to use the
<tt>mc</> tag when defining entities which are only to be used within
formulas, as the SGML parser will complain if they are used elsewhere.
If <tt>x</> were used instead, such errors would first be caught by
&TeX;.
<code>
<!element mc - - cdata >
</code>
There are a number of parameters for formulas. These will most
likely be of little interest to most users, but are stated here for
the sake of completeness.
<code>
<!entity % sppos "tu" >
<!entity % fcs "%sppos;|phr" >
<!entity % fcstxt "#pcdata|mc|%fcs;" >
<!entity % fscs "rf|v|fi" >
<!entity % limits "pr|in|sum" >
<!entity % fbu "fr|lim|ar|root" >
<!entity % fph "unl|ovl|sup|inf" >
<!entity % fbutxt "(%fbu;) | (%limits;) |
(%fcstxt;)|(%fscs;)|(%fph;)" >
<!entity % fphtxt "p|#pcdata" >
</code>
There are three elements for representing formulas: <tt>f</>, for
ordinary short formulas appearing "in-line"; <tt>dm</> for
<em>displayed formulas</> to be centered on a line (or lines) by
themselves; and <tt>eq</> for displayed formulas which are to be
numbered sequentially throughout the document (i.e. so-called
"equations").
<code>
<!element f - - ((%fbutxt;)*) -(footnote) >
<!entity fendtag '&et;f>' -- formula end -- >
<!shortref fmap
"&ero;#RS;B" null
"&ero;#RS;B&ero;#RE;" null
"&ero;#RS;&ero;#RE;" null
"_" thinsp
"~" nbsp
"]" fendtag
"#" num
"%" percnt
"^" circ
"{" lcub
"}" rcub
"|" verbar>
<!usemap fmap f >
<!element dm - - ((%fbutxt;)*) -(footnote)>
<!element eq - - ((%fbutxt;)*) -(footnote)>
<!shortref dmmap
"&ero;#RE;" space
"_" thinsp
"~" nbsp
"]" fendtag
"#" num
"%" percnt
"^" circ
"{" lcub
"}" rcub
"|" verbar>
<!usemap dmmap (dm,eq)>
</code>
Usually it is not necessary to type the starting and ending tags of
the <tt>f</> element explicitly: &lsqb and &rsqb are short
reference delimiters, allowing one to simply type, for example,
<tt>[&alpha &rarr &beta]</>, instead of
<tt><f>&alpha &rarr &beta</f></tt> to represent
[&alpha &rarr &beta].<footnote>&TeX users will appreciate that this
notation is no more verbous than &TeX;.</footnote>
The only characters of interest in <tt>fmap</> are &lowbar &tilde
and ]. &lowbar is a short reference for <tt>thinsp</>, which adds a
little extra horiztonal space. &tilde means <tt>nbsp</>, which in
turn denotes a non-breaking space. &TeX will not start a new line at
a <tt>nbsp</>. Finally, ] is used to end the formula. The other
characters in this map just protect us from any special meaning &TeX
gives them.
The <tt>dmmap</> is much the same as the <tt>fmap</>. There are
just two differences: 1) ] is not a short reference for the <tt>f</>
closing tag (and instead has its literal meaning), and 2) carriage
returns and new lines are replaced by spaces, for reasons having to do
with the way &TeX formats formulas. Use the <tt>tu</> element,
defined a bit later, to force line breaks in formulas.
Of course, formulas consist of more than just a string of math
symbols. There are elements for representing fractions (<tt>fr</>),
products (<tt>pr</>), integrals (<tt>in</>), sums (<tt>sum</>), roots
(<tt>root</>) and arrays (<tt>ar</>). Each of these will be described
next.
A fraction consists of a numerator (<tt>nu</>) and a denominator
(<tt>de</>). For example, [12/37] can be written as:
<verb>
[<fr><nu>12<de>37&et;fr>]
</verb>
Of course, this is rather lengthy. For simple fractions such as
this, you may prefer to just type <tt>[12/37]</>, which is
formatted by &LaTeX in the same way.<footnote>On the other hand, if
you are a SGML purist, you may prefer not to do this, as it makes
assumptions about the formatting system being used.</footnote>
<code>
<!element fr - - (nu,de) >
<!element nu o o ((%fbutxt;)*) >
<!element de o o ((%fbutxt;)*) >
</code>
Products, integrals and sums all have similiar structure,
consisting of a <em>lower limit</> (<tt>ll</>), an <em>upper limit</>
(<tt>ul</>) and an optional <em>operand</> (<tt>opd</>).
<code>
<!element ll o o ((%fbutxt;)*) >
<!element ul o o ((%fbutxt;)*) >
<!element opd - o ((%fbutxt;)*) >
<!element pr - - (ll,ul,opd?) >
<!element in - - (ll,ul,opd?) >
<!element sum - - (ll,ul,opd?) >
</code>
So, for example,
<dm>
<sum><ll>i=1<ul>n<opd>x<inf>i</></sum> =
<in><ll>0<ul>1<opd>f</in>
</dm>
was typed as:
<verb>
<dm>
<sum><ll>i=1<ul>n<opd>x<inf>i&et>&et;sum> =
<in><ll>0<ul>1<opd>f&et;in>
&et;dm>
</verb>
This example also shows how to represent subscripts, using the
<tt>inf</> tag. There is also a <tt>sup</> tag for superscripts.
For operators with upper and lower limits other than products, sums
or integrals, use the <tt>lim</> element.
<code>
<!element lim - - (op,ll,ul,opd?) >
<!element op o o (%fcstxt;|rf|%fph;) -(tu) >
</code>
For example,
<dm>
<lim><op>&bigcup<ll>i=0<ul>n<opd>{&alpha<inf>i</> &rarr &beta}</lim>
</dm>
was typed as
<verb>
<!entity bigcup "<mc>\bigcup&et;>">
...
<dm>
<lim>&ero;bigcup<ll>i=0<ul>n&et;>
<opd>{&ero;alpha<inf>i&et;> &ero;rarr &ero;beta}&et>
&et;lim>
&et;dm>
</verb>
Notice that it isn't necessary to type the <tt>op</> tag here.
Roots can be represented using the, what else, <tt>root</> element.
By default, <tt>root</> produces square roots. The <tt>n</> attribute
of <tt>root</> can be used for other roots. For example, type
<tt>[<root n=3/x+y/]</tt> to get [<root n=3/x+y/].
<code>
<!element root - - ((%fbutxt;)*) >
<!attlist root
n cdata "">
</code>
Arrays, or matrices, consist of a sequence of rows, each of which
contains a sequence of columns. Every row in the array must contain
the same number of columns. Rows are <em>separated</> by the
<tt>arr</> tag; columns by the <tt>arc</> tag. The array itself is
delimited by the <tt>ar</> tag.
<code>
<!element col o o ((%fbutxt;)*) >
<!element row o o (col, (arc, col)*) >
<!element ar - - (row, (arr, row)*) >
<!attlist ar
ca cdata #required >
<!element arr - o empty >
<!element arc - o empty >
</code>
This is a place where an SGML short reference map has proven
useful:
<code>
<!entity arr "<arr>" >
<!entity arc "<arc>" >
<!shortref arrmap
"&ero;#RE;" space
"@" arr
"|" arc
"_" thinsp
"~" nbsp
"#" num
"%" percnt
"^" circ
"{" lcub
"}" rcub >
<!usemap arrmap ar >
</code>
Columns can be separated using the &verbar character; rows
with the &commat character.
For example, this matrix
<dm>
<ar ca=clcr>
a+b+c | uv | x-y | 27 @
a+b | u+v | z | 134 @
a | 3u+vw | xyz | 2,978
</ar>
</dm>
was typed as:
<verb>
<ar ca=clcr>
a+b+c | uv | x-y | 27 @
a+b | u+v | z | 134 @
a | 3u+vw | xyz | 2,978
&et;ar>
</verb>
The <em>column alignment</> of an array must be specified using the
<tt>ca</> attribute, as shown in the example. For each column in the
array, there is a letter in the <tt>ca</> attribute. There are three
alternatives: 1) <tt>c</> for centered; 2) <tt>l</> for flush left;
and 3) <tt>r</> for flush right.
There remain a few miscellaneous math elements to describe.
<tt>sup</> and <tt>inf</>, for superscripts and subscripts, were
mentioned above. <tt>unl</> and <tt>ovl</> can be used to
<em>underline</> or <em>overline</> formulas. <tt>rf</> is used for
identifiers, such as function names (e.g. <tt>cos</> or <tt>sin</>)
within formulas. Similarly, <tt>phr</> is used to delimit phrases of
ordinary text within formulas. (Both of these are necessary, as
strings of characters within formulas denote sequences of variables,
not words.) The <tt>v</> tag can be used to denote a <em>vector</>,
as in [<v>x</>]. Calligraphic characters, such as [<fi>L</>], can be
denoted using the <tt>fi</> tag. Finally, line breaks can be inserted
into formulas using the <tt>tu</> element.
<code>
<!element sup - - ((%fbutxt;)*) -(tu) >
<!element inf - - ((%fbutxt;)*) -(tu) >
<!element unl - - ((%fbutxt;)*) >
<!element ovl - - ((%fbutxt;)*) >
<!element rf - o (#pcdata) >
<!element phr - o ((%fphtxt;)*) >
<!element v - o ((%fcstxt;)*)
-(tu|%limits;|%fbu;|%fph;) >
<!element fi - o (#pcdata) >
<!element tu - o empty >
<!usemap global (rf,phr)>
</code>
<sect1>Definitions, Lemmas and Theorems</>
<p>
There are a number of elements useful for representing
<em>definitions</> (<tt>def</>), <em>propositions</> (<tt>prop</>),
<em>lemmas</> (<tt>lemma</>), <em>corollaries</> (<tt>coroll</>),
<em>proofs</> (<tt>proof</>), and <em>theorems</> (<tt>theorem</>).
<code>
<!element def - - (thtag?, p+) >
<!element prop - - (thtag?, p+) >
<!element lemma - - (thtag?, p+) >
<!element coroll - - (thtag?, p+) >
<!element proof - - (p+) >
<!element theorem - - (thtag?, p+) >
<!element thtag - - (%inline)>
<!usemap global (def,prop,lemma,coroll,proof,theorem)>
<!usemap oneline thtag>
</code>
With the exception of <tt>proof</>, these all have the same
structure: an optional <tt>thtag</> followed by some paragraph level
elements. Here is an example:
<theorem><thtag>Alexander's Theorem</>
Let [<fi/G/] be a set of nontrivially achievable
subgoals and < an order on [<fi/G/]. < is
abstractly indicative if and only if it is a
linearization of [<lim>< <ll> <fi/G/ <ul>&ast </lim>].
</theorem>
This was typed as:
<verb>
<theorem><thtag>Alexander's Theorem&et>
Let [<fi/G/] be a set of nontrivially achievable
subgoals and &ero;lt an order on [<fi/G/]. &ero;lt
is abstractly indicative if and only if it is a
linearization of
[<lim>&ero;lt <ll> <fi/G/ <ul> &ero;ast &et;lim>].
&et;theorem>
</verb>
<sect1> The <tt>global</> Short Reference Map
<p>
The <tt>global</> short reference map, which is the default map in
effect within <tt>qwertz</> documents, allows the &dquot symbol to be
used to start a <em>short quote</> (<tt>sq</>) and &lsqb to start a
<em>formula</> (<tt>f</>). Also, &tilde is used for non-breaking
spaces. The rest of the short references just serve to hide any
special meaning &TeX gives these characters, allowing them to be
directly typed without having to use entity references.
<code>
<!entity qtag '<sq>' >
<!shortref global
"&ero;#RS;B" null -- delete leading blanks --
'"' qtag
"[" ftag
"~" nbsp
"_" lowbar
"#" num
"%" percnt
"^" circ
"{" lcub
"}" rcub
"|" verbar>
<!usemap global qwertz>
</code>
<sect>Cross References</>
<p>
Places within a document can be marked using the <tt>label</>
element. Labels have an <tt>id</> attribute for naming the label.
The SGML parser will check that these identifiers are unique within
the document, and that they are referenced. That is, the parser will
complain if there is no reference to a label. For this reason, labels
should probably be created on demand, rather than in anticipation of
the need for a reference to the element.
There are two kinds of references: <tt>ref</> for
references to the number of some element, such as a section, figure or
theorem, and <tt>pageref</>, for references to the number of the page
on which the text around the label occurs when the document is
printed. Both types of references have an <tt>id</> attribute for
stating the identifier of the label being referenced. The number of
the element or page will be printed at the place of the <tt>ref</> or
<tt>pageref</>.
<code>
<!element label - o empty>
<!attlist label id cdata #required>
<!element ref - o empty>
<!attlist ref
id cdata #required>
<!element pageref - o empty>
<!attlist pageref
id cdata #required>
</code>
For example, a reference to the section on miscellaneous elements
of this manual, section <ref id=misc>, would be typed as:
<verb>
... section <ref id=misc>, would be ...
</verb>
The label itself was typed as:
<verb>
<sect><heading><label id="misc">
Miscellaneous Elements&et>
</verb>
<sect><heading><label id="misc">Miscellaneous Elements</>
<p>
There are just a couple general purpose elements remaining to be
discussed, which don't seem to have found a suitable home yet
elsewhere in this manual.
Editorial comments and reminders to oneself can be marked with the
<tt>comment</> tag. These comments will be printed using a different
type style than the body of the text. In the <tt>qwertz</> mapping
into &TeX, they are printed using the <sl>slanted type style</>.
If you do not want the comment to be printed, use the standard
SGML notation for comments instead: <tt><!-- &hellip --></tt>.
Finally, there is an "escape" element, allowing you to include raw
formatting code at any place in your document, the <tt>x</> element. This
code will be passed on to the formatter, such as &TeX;, inline, at the point it appears in
your document. Of course, this "feature" should be used judiciously, as it limits the formatter
independence of the document.
<code>
<!element comment - - (%inline)>
<!element x - - ((#pcdata | mc)*) >
<!usemap #empty x >
</code>
Notice that math character (<tt>mc</>) elements may appear within
<tt>x</> elements. This allows you to use SGML entity references for
math characters, to help avoid having to rememember both the SGML and
the formatter's names for these symbols. Other entities may also be used, so
long as they expand to character data.
<sect>Articles, Reports and Books</>
<p>
Articles, reports and books are structurally very similar. They
may be formatted differently, of course, but this is of little
importance during the writing phase of primary interest to authors.
Seen abstractly, each type of document consists of a <em>title
page</>, for such information as the title of the document, the names
of the authors and so on, followed perhaps by an <em>abstract</>, and
then by a sequence of <em>chapters</> or <em>sections</>. There may
be <em>citations</>, which are references to documents listed at the
end, in a <em>bibliography</>. Perhaps there are one or more
<em>appendices</>. Finally, these documents may also contain
<em>footnotes</>.
Let us first precisely describe the overall structure of these document
types, before moving on to describe their various components. The
article element is defined as:
<code>
<!element article - -
(titlepag, header?, abstract?,
toc?, lof?, lot?, p*, sect*,
(appendix, sect+)?, biblio?) +(footnote)>
<!attlist article
opts cdata "null">
</code>
The <em>options</> attribute (<tt>opts</>) of <tt>article</>
provides a place to state <em>formatting</> options, which are passed
on to &LaTeX;. The particular options available depends on the
installation of &LaTeX being used, but the following should
always be available:
<descrip>
<tag><tt>11pt, 12pt.</></tag>
Set the "normal" font size to eleven, or twelve, point, instead of
the default 10 point size.
<tag><tt>twoside.</></tag>
Formats the document for printing on both sides of a page.
<tag><tt>twocolumn.</></tag>
Formats the document with two columns per page, as is common in
the proceedings of scientific conferences, for example.
<tag><tt>titlepage.</></tag>
Causes the title page and abstract to be printed on a
separate page.
</descrip>
Other options which may be supported include:
<descrip>
<tag><tt>dina4.</></>
Formats the document for printing on <bf/DIN A4/ size paper. (As
this is the size paper used at our installation, this option is
included automatically during the translation.)
<tag><tt>german.</></tag>
Causes the &TeX hyphenation algorithm to "think German", and
sections, bibliographies and such to be labelled using the
appropriate German terms.
<tag><tt>times, bookman, palatino …</tt></>
Causes the "main" font to be the selected PostScript font, instead
of the standard &TeX font, Computer Modern, and maps all other type faces
to some suitable PostScript font or type style.
</descrip>
For example, the starting tag for some article might be:
<verb>
<article opts="bookman,11pt">
</verb>
Reports are just like articles, except that they consist of a
sequence of chapters (<tt>chapt</>), instead of sections
(<tt>sect</>):
<code>
<!element report - -
(titlepag, header?, abstract?, toc?, lof?, lot?, p*,
chapt*, (appendix, chapt+)?, biblio?) +(footnote)>
<!attlist report
opts cdata "null">
</code>
Books are similar to reports, except that they may not include an
abstract:
<code>
<!element book - -
(titlepag, header?, toc?, lof?, lot?, p*, chapt*,
(appendix, chapt+)?, biblio?) +(footnote) >
<!attlist book
opts cdata "null">
</code>
The options attribute (<tt>opt</>) for <tt>report</> and
<tt>book</> elements is the same as that for articles, just described,
except the <tt>titlepage</> option, which is applicable only for
articles.
The rest of this chapter describes the common elements of articles,
reports and books, starting with title pages.
<sect1>Title Pages</>
<p>
A title page (<tt>titlepag</>) consists of a title, a number of
authors (<tt>author</>) and an optional date (<tt/date/). The title
may refer to a footnote and may also include a <tt>subtitle</>. If
the date element is omitted, today's date will be printed by default.
To avoid having a date printed, include an empty <tt/date/ element.
<code>
<!element titlepag o o (title, author, date?)>
<!element title - o (%inline, subtitle?) +(newline)>
<!element subtitle - o (%inline)>
<!usemap oneline titlepag>
</code>
The <tt>author</> element includes the <tt>name</> and, optionally,
institution (<tt>inst</>) of the author. If there are multiple
authors, these are separated with the <tt>and</> tag. Also,
acknowledgements can be expressed using the <tt>thanks</> element.
These are formatted by &LaTeX as footnotes on the title page.The <tt>author</> element includes the <tt>name</> and, optionally,
institution (<tt>inst</>) of the author. If there are multiple
authors, these are separated with the <tt>and</> tag. Also,
acknowledgements can be expressed using the <tt>thanks</> element.
These are formatted by &LaTeX as footnotes on the title page.
<code>
<!element author - o (name, thanks?, inst?,
(and, name, thanks?, inst?)*)>
<!element name o o (%inline) +(newline)>
<!element and - o empty>
<!element thanks - o (%inline)>
<!element inst - o (%inline) +(newline)>
<!element date - o (#pcdata)>
<!usemap global thanks>
</code>
Within the <tt>titlepag</>, the <tt>title</>, <tt>subtitle</>,
<tt>author</> and <tt/inst/ elements can be broken into multiple lines
using the <tt>newline</> element or, if you prefer, the <tt>nl</>
entity.
<code>
<!element newline - o empty >
<!entity nl "<newline>">
</code>
The title page of this manual was typed as:
<verb>
<title>The <tt/qwertz/ SGML Document Types
<subtitle>(Version 1.1 Reference Manual)
<author>Tom Gordon
<inst> Institute for Applied Information Technology (F3) &ero;nl&ero;nl
German National Research Center &ero;nl
for Computer Science (GMD)
</verb>
Notice the <tt>titlepag</> tags are optional. The simplest title
page would include a title and author:
<verb>
<title> A Very Short Title Page
<author> Snoopy
</verb>
<sect1>Abstracts</>
<p>
Articles and reports, but not books, may have an abstract, which
consists of one or more paragraphs, including the various kinds of
lists, mathematical formulas and elements for literate programming:
<code>
<!element abstract - - (p+)>
</code>
<sect1>Table of Contents</>
<p>
There are three elements for stating whether or not a table of
contents, list of figures or list of tables should be included in the
document. These tables and lists are generated by &LaTeX;. Therefore
the contents of these elements is empty. They are only used to
specify that the list or table should be included.
<code>
<!element toc - o empty>
<!element lof - o empty>
<!element lot - o empty>
</code>
<sect1>Headers</>
<p>
A <tt>header</> element specifies what should be printed at the top
of each page. It consists of a left heading (<tt>lhead</>) and a
right heading (<tt>rhead</>). Both elements are required, if a heading is
used at all, but either may be left empty, so that the effect of
having only a left or right heading can be achieved easily enough.
<code>
<!element header - - (lhead, rhead) >
<!element lhead - o (%inline)>
<!element rhead - o (%inline)>
</code>
As we will see, an initial header can be given after the title
page. Afterwards, a new header can be given for each new chapter or
section. The header printed on a page is the one which is in effect
at the end of the current page. So that the header will be that of
the last section starting on the page.
<sect1>Sectioning</>
<p>
The naming scheme we have adopted for sections is a bit different
than that of &LaTeX;, because the names of SGML identifiers may be
at most only eight characters long. But we think the scheme we have
chosen has its advantages. In books and reports, the top-level
sectional unit is the <em>chapter</> (<tt>chapt</>). In articles, it
is the <em>section</> (<tt>sect</>). The lower sectional units are
<tt>sect1</>, <tt>sect2</>, <tt>sect3</>, and <tt>sect4</>, in that
order.
Each section (or chapter) consists of a <tt>heading</>, followed by
an optional <tt>header</>, a number of paragraphs (including such things as
graphics), and then sections of the next lower level.
<code>
<!entity % sect "heading, header?, p* " >
<!element heading o o (%inline)>
<!element chapt - o (%sect, sect*) +(footnote)>
<!element sect - o (%sect, sect1*) +(footnote)>
<!element sect1 - o (%sect, sect2*)>
<!element sect2 - o (%sect, sect3*)>
<!element sect3 - o (%sect, sect4*)>
<!element sect4 - o (%sect)>
<!usemap oneline (chapt,sect,sect1,sect2,sect3,sect4)>
</code>
Don't confuse the headers with headings. The <tt>heading</> is
just the text printed at the point where the section begins, naming
the section. The <tt>header</> changes the text printed at the top of
each page.
If there are cross references to the section, put the
<tt>label</> in the heading. For example, you could type:
<verb>
<sect><heading><label id=mysect>My First Section&et>
</verb>
If a label isn't required, you can leave the <tt>heading</> tag
implicit:
<verb>
<sect>My First Section
</verb>
The <tt>appendix</> element marks the begin of a sequence of
appendices. These are chapters or sections, depending on whether the
document is an article, report or book, and differ from ordinary
chapters or sections only in the way the are numbered, and of course
their placement at the end of the document.
<code>
<!element appendix - o empty >
</code>
<sect1>Footnotes</>
<p>
The tag for footnotes is, simply enough,
<tt>footnote</>.<footnote>To be sure the marker for the footnote is
formatted propertly, be sure not to leave a space between the
character after which the footnote marker is to appear and the
beginning of the footnote element itself.</>
<code>
<!element footnote - - (%inline)>
<!usemap global footnote>
</code>
Footnotes can appear anywhere within a section (or chapter). The
<tt>usemap</> declaration is required to cancel the <tt>lines</> map
used in title pages.
<sect1>Citation</>
<p>
Literature references can be made using the <tt>cite</> and
<tt>ncite</> elements. The only difference between them is that the
<tt>ncite</> allows a short <em>note</> to be included in the
reference, for such things as page numbers.
<code>
<!element cite - o empty>
<!attlist cite
id cdata #required>
<!element ncite - o empty>
<!attlist ncite
id cdata #required
note cdata #required>
</code>
For example, one might type
<verb>
<ncite id="Bryan88" note="pg.68">
</verb>
to refer to page 88 of Martin Bryan's
book on SGML. This would appear, using &LaTeX, as <ncite id="Bryan88"
note="pg. 68"> in the printed document.
The <tt>id</> attribute of a <tt>cite</> or <tt>ncite</> is a
reference to an identifier of a Bib&TeX bibliography file. There is a
<tt>qwertz</> SGML document type for creating such bibliographies,
described below.
The bibliography itself, or list of references, is generated by
including a <tt>biblio</> element near the end of the document, before
the appendix.
<code>
<!element biblio - o empty>
<!attlist biblio
style cdata "qwertz"
files cdata "">
</code>
The <tt>files</> attribute of <tt>biblio</> is a list of the names
of the bibliographies used, separated by commas. The names should not
include any file suffixes, such as ".bib" or ".sgml". For example,
to cite publications on artificial intelligence and cognitive
science, where the bibliograhies are maintained in two files,
<tt>ai.sgml</> and <tt>cogsci.sgml</>, you would type:
<verb>
<biblio files="ai,cogsci">
</verb>
The <tt>style</> attribute determines how the bibliography is
formatted. Five styles are supported:
<descrip>
<tag><tt>plain</>
Entries are sorted alphabetically and labeled with numbers.
<tag><tt>unsrt</>
The same as <tt>plain</> except the entries are ordered as they
appear in the document, rather than alphabetically.
<tag><tt>alpha</>
The same as <tt>plain</>, except that labels are made from the author's
name and the year of publication.
<tag><tt>abbrv</>
The same as <tt>plain</> except that first names, month names, and
journal names are abbreviated.
<tag><tt>qwertz</>
The same as <tt>plain</> except that all words of the entry are
capitalized exactly as they appear in the source file of the
bibliography. The <tt>plain</> style applies capitalization rules
which are inappropriate, e.g., for German titles.
</descrip>
<sect>Slides
<p>
The <tt>slides</> element is for making a series of slides or, more
commonly, overhead transparencies. Although you may often prefer to
use some other program for preparing presentations, this approach has
its advantages when you want to include parts of an existing article
or book on your transparencies. You can just "cut and paste" the SGML
source from an article onto a slide. You may also prefer this
approach if your presentation includes mathematical formulas, to be
able to take advantage of &TeX's excellent mathematics typesetting.
<code>
<!element slides - - (slide*) >
<!attlist slides
opts cdata "null">
</code>
Each slide consists of an optional title, followed by one or more
<tt>slpar</> elements:
<code>
<!element slide - o (title?, p+) >
</code>
Notice that not every element available in an article or book is also
available here. In particular, there are no sectioning elements,
cross references, footnotes or a bibliography.<footnote>Our
translation into &TeX does not use Sli&TeX;, so as to allow slides to
include tables and figures. </footnote>
The <tt>title</> element will be centered on the line. You can break
up the title into multiple lines with <tt>newline</> elements. The
various type style elements, such as <tt>em</> and <tt>bf</>, can also
be used here; indeed anywhere on a slide.
<sect> Letters and Electronic Messages
<p>
The <tt>letter</> element is for making letters and e-mail
messages. Just how a letter is formatted may depend on whether it is
a business or personal letter. If it is a business letter, it may be
printed to appear as if the company's letterhead stationery had been
used.
The structure of a letter can be quite complex, but most the
elements to be described here are optional. Using an example from
<cite id="Lamport86">, a simple letter would be typed like this:
<verb>
<letter>
<from>
R. (Ma) Dillo
<address> 1234 Ave.~of the Armadillos &ero;nl
Gnu York, G.Y. 56789
<to>
Dr.~G. Nathaniel Picking
<address> Acme Exterminators &ero;nl
33 Swat Street &ero;nl
Hometown, Illinois 62301
<cc> Jimmy Carter &ero;nl
Richard M. Nixon
<opening> Dear Nat,
I'm afraid that the armadillo problem is still
with us. I did everything ...
... and I hope we can get rid of the nasty beasts
this time.
<closing> Best regards,
&et;letter>
</verb>
The <tt>from</> and <tt>to</> elements are for the sender's and
receiver's names and addresses, respectively. The address may be
either a street address, using <tt/address/, or an electronic mail
address, using <tt/email/, or both. You may also include a telephone
number, using the <tt/phone/ element. (If you are using your company's
letterhead stationery, it may be that you should type only your
extension, rather than your complete telephone number.) Finally, a
telefax number can be provided, using the <tt/fax/ element.
Notice that in the <tt>closing</> you must type a comma yourself,
if you want one. Also, do not type your name again after the closing;
the <tt>name</> of the sender will be printed after the closing as
expected.
There are several optional elements which may be of interest:
<descrip>
<tag><tt>subject</>
For the purpose or, well, subject of the letter. If you would
like this subject line to appear as "re: &hellip", for example, you
must type the "re: " yourself, as part of the subject.
<tag><tt>sref, rref, rdate</>
These are tags for the <em>sender's reference</>, <em>receiver's
reference</> and <em>receiver's date</> where you can include whatever
code is used by your, or the recipient's, company or institution to
uniquely identify letters. For example, if this letter is a response
to some other letter, you may use the <tt>rref</> and <tt>rdate</>
elements to identify the original letter. There is no <tt>sdate</>
tag, as the date this letter is printed will be included in the letter
at some appropriate place by the formatter.
<tag><tt>cc</>
This used to be an acronym for "carbon copies", which were to be
sent to persons other than the principal recipient of the letter. The
<tt>cc</> tag can be used to list these other recipients, even though
the copies they receive today are perhaps printed by a laser printer
on recycled paper. As in the above example, you can separate the
names of these recipients with <tt>newline</> elements (using the
<tt>nl</> entity if you prefer).
<tag><tt>encl</>
Use this tag to list <em>enclosures</>. These can also be separated
with <tt>newline</> elements, or simply with commas, if you prefer.
<tag><tt>ps</>
A postscript, not to be confused with PostScript, can be included
with this tag. Any kind of element which can appear in the body
of the letter (i.e. <tt>sectpar</> elements) can also be used here.
</descrip>
To summarize, here are the relevant SGML declarations:
<code>
<!entity % addr "(address?, email?, phone?, fax?)" >
<!element letter - -
(from, %addr, to, %addr, cc?, subject?, sref?, rref?,
rdate?, opening, p+, closing, encl?, ps?)>
<!attlist letter
opts cdata "null">
<!element from - o (#pcdata) >
<!element to - o (#pcdata) >
<!usemap oneline (from,to)>
<!element address - o (#pcdata) +(newline) >
<!element email - o (#pcdata) >
<!element phone - o (#pcdata) >
<!element fax - o (#pcdata) >
<!element subject - o (%inline;) >
<!element sref - o (#pcdata) >
<!element rref - o (#pcdata) >
<!element rdate - o (#pcdata) >
<!element opening - o (%inline;) >
<!usemap oneline opening>
<!element closing - o (%inline;) >
<!element cc - o (%inline;) +(newline) >
<!element encl - o (%inline;) +(newline) >
<!element ps - o (p+) >
</code>
<sect> Telefax Messages
<p>
The structure of a telefax message is the same as for letters and
e-mail messages, except that the <tt/fax/ number of the recipient is,
of course, required, rather than optional.
<code>
<!element telefax - -
(from, %addr, to, address, email?,
phone?, fax, cc?, subject?,
sref?, rref?, rdate?,
opening, p+, closing, ps?)>
<!attlist telefax
opts cdata "null"
length cdata "2">
</code>
<sect> Notes
<p>
The <tt/notes/ element is a new top-level document "style", like
articles, books and letters. It is useful for miscellaneous purposes,
such as jotting down notes to oneself, where the complex structure of
the other styles is unnecessary. Notes here simply a sequence of
section paragraphs (i.e. paragraphs, lists, comments, long quotations,
figures, tables, displayed mathematical formulas, and program code).
An optional title is also available. The contents of a notes document
can be copied and pasted into a section or chapter of a book or article.
<code>
<!element notes - - (title?, p+) >
<!attlist notes
opts cdata "null" >
</code>
<sect> Manual Pages
<p>
The <tt>manpage</> element is for Unix manual pages. Here we see again
an advantage of SGML. Using this element, the very same manual
page can be viewed on just about every terminal, using <tt>nroff</>,
or be included as a section of an article, report or book to be
formatted by &TeX;.
<code>
<!element manpage - - (sect1*)
-(sect2 | f | %mathpar | figure | tabular |
table | %xref | %thrm )>
<!attlist manpage
opts cdata "null"
title cdata ""
sectnum cdata "1" >
</code>
A manpage consists of a sequence of sections. There are two SGML
attributes, for the command name and manual section number,
respectively. Each section of the manual page is delimited by a
<tt>sect1</> element. <em/Notice that these sections may not contain
further subsections./ Sections are represented as <tt>sect1</>
elements, rather than <tt>sect</>, to allow the manual page to be
easily cut and pasted into a <tt>sect</> section of an article, report
or book. (Of course, if the manual page is to be used a chapter of a
book, then these sections of the manual page will need to be replaced
with <tt>sect</> elements.)
Notice that Many elements, such as tables, figures and
mathematical formulas, cannot be used within manual pages, because of
limitations of ASCII terminals, or the Unix <tt/man/ macro package for
<tt/nroff/.
There is a short reference map in effect within the scope of the
<tt>manpage</>. With the exception of [, which is not used here
to start formulas, this map <em>has the same effect</> as the
<tt>global</> map.
<code>
<!shortref manpage
"&ero;#RS;B" null
'"' qtag
"[" ftag
"~" nbsp
"_" lowbar
"#" num
"%" percnt
"^" circ
"{" lcub
"}" rcub
"|" verbar>
<!usemap manpage manpage >
</code>
<sect1> Manual Page Conventions
<p>
For detailed information about the conventions for Unix manual
pages, see your Unix documentation. But here is a brief summary. The
typical manual page has the following sections, in this order:
<descrip>
<tag> NAME.
The name, or list of names, by which the command or function is
called, followed by a dash and then a one-line summary of its purpose.
<tag> SYNOPSIS.
For the syntax of the command and its arguments. (The Sun
documentation suggests that literals be formatted using boldface type,
and that variables be formatted using italics type. Use the <tt>tt</>
and <tt>em</> elements, respectively, here for this purpose.)
<tag> DESCRIPTION.
An overview of the command or function's purpose, effects and use.
<tag> OPTIONS.
A list and description of all command-line options.
<tag> FILES.
A list of files associated with the command which may be of
interest to users.
<tag> SEE ALSO.
A comma-separated list of related Unix commands, and references to
other relevant publications.
<tag> DIAGNOSTICS.
A list and explanation of any diagnositic messages the command may
write to the standard error output file.
<tag> BUGS.
A description of any known bugs, problems, or limitations.
</descrip>
Some of you may be asking yourselves why <tt>manpage</> wasn't
designed so that each of these conventional sections of a manual page
is represented by its own SGML element. That certainly would have
been possible, but on the other hand the approach taken has the
advantage that users can simply cut and paste sections between manual
pages and article, reports and books. Of course it would have been
easy to write a filter to convert between these formats, but it was
felt that the benefits of a special <tt>manpage</> format would be too
small to warrant even this limited effort. After all, unless one is
using an SGML structure editor, users must refer to the SGML document
type definition to know what is expected in the manual page. It is
just as easy to check this documentation to see what sections
conventionally appear in manual pages. There is also a file which can
be used as a template or form for writing manual pages. See the Unix
Commands chapter for details.
The only reason there is a <tt>manpage</> document type, instead
of just another translation of, say, the <tt>article</> document type
into <tt>nroff</> is that the <tt>man</> macros used for the Unix
documentation are not powerful enough to format all of the features
available in our <tt>latex</> document type. Having this separate
<tt>manpage</> document type provides a means of checking whether the
manual page can be formatted by <tt>nroff</> using these <tt>man</>
macros. Again, as this document type is designed to be a subset of
the <tt>latex</> document type, the sections of a manual page can also
be included within instances of the <tt>latex</> document type.
<sect1> Manual Page Example
<p> Here is how the manual page for the <tt>cd</> command could have
been typed using this document type definition:
<verb>
<manpage title="CD">
<sect1> NAME
<p>cd &ero;mdash change working directory
<sect1> SYNOPSIS
<p> cd [ <em>directory&et;> ]
<sect1> DESCRIPTION
<p> <em>directory&et;> becomes the new working directory. The process
must have execute (search) permission in <em>directory&et;>. If cd is
used without arguments, it returns you to your login directory.
...
<sect1> SEE ALSO
<p> csh(1), pwd(1), sh(1)
&et;manpage>
</verb>
This is the end of the <tt>qwertz</> document type definition.
<code>
<!-- end of qwertz dtd -->
</code>